Historic, archived document 


Do not assume content reflects current 
scientific knowledge, policies, or practices. 


United States 
Department of 
Agriculture 


Forest Service 


Pacific Northwest 
Research Station 


) Research Paper 
L BNWo60 4 
May 1986 


PSW FOREST AND RANGE 
EXPERIMENT STATION 


AUG 2 1 1986 


Sampling Designs 
for Estimating the 
Total Number of Fish 
in Small Streams 


David G. Hankin 


Author 


DAVID G. HANKIN is an associate professo 


ty, Arcata, California 95521. The research v 
agreement with the Pacific Northwest R 
: “ ; 4 ete ‘x: 


{t iy P= ee 


Abstract 


Summary 


Hankin, D.G. Sampling designs for estimating the total number of fish in small streams. 
Res. Pap. PNW-360. Portland, OR: U.S. Department of Agriculture, Forest Service, 
Pacific Northwest Research Station. 1986. 33 p. 


A common objective of fisheries research is estimating the total number of fish in small 
streams. The conventional approach involves (1) selecting a small sample of equal- 
length sections of stream, and (2) estimating the total number of fish in each section 
using removal method or mark-recapture estimators. Error of estimation of the total 
number of fish in a stream arises from two sources: (1) extrapolation from the small 
number of sampled sections to the entire stream, and (2) errors of estimation of fish 
numbers within sampled sections. This report shows that errors arising from the first 
source will usually be far larger than those arising from the second source. Total errors 
of estimation can be reduced by making sampled sections equivalent to natural habitat 
units. Entire pools or riffles should be sampled rather than fixed-length sections of 
streams. The relative performances of three alternative sampling designs, which can be 
used when sampled sections are equivalent to natural habitat units, are contrasted in 
terms of accuracy and cost-effectiveness. Accuracy of estimation can be dramatically 
improved if sampling designs account for the usually strong, positive correlation 
between fish numbers and habitat unit sizes. 


Keywords: Sampling designs, population sampling, fish population, fish habitat. 


The traditional sampling design used to estimate the total number of fish in small 
streams involves (1) dividing a stream into sections of equal length, (2) selecting a 
simple random sample from these sections, and (3) using some population estimator 
within each of the selected sections. In sampling theory jargon, this design is termed a 
two-stage sampling design with equal-sized primary units. Errors of estimation in a 
two-stage design arise from two sources: (1) errors of extrapolation from the small 
number of sampled sections to the entire stream, and (2) errors of estimation of fish 
numbers within sampled sections. Error arising from the first source is measured by the 
variation among (estimated) primary unit totals and is termed first-stage variance. Error 
arising from the second source is measured by the average error of population 
estimation within selected sections and is termed second-stage variance. In the usual 
small-stream context, first-stage variance is very large compared to second-stage 
variance. 


Large first-stage variance for the traditional design results from equal-sized primary 
units being of unequal habitat quality. Certain selected sections may consist primarily of 
riffle habitat, whereas others may consist primarily of pool habitat. Because densities of 
fish usually vary considerably among habitat types, the equal-sized primary unit design 
results in substantial variation among the numbers of fish in primary units and, thus, in 
large first-stage variance. 


Stratification can be used to improve the precision of estimation of the total number of 
fish in small streams. A stream can be mapped and strata formed on the basis of habitat 
unit types; for example, pools or riffles. Independent samples can then be drawn from 
each constructed stratum, and independent estimates of the total number of fish within 
each stratum can be made. An estimate of the total number of fish in the entire stream 
can then be obtained by summing all stratum estimates; estimated variance of this 
estimated total can be obtained by summing variance estimates for independently 
estimated stratum totals. 


Contents 


If stratification is used to improve precision of estimates, then the natural habitat units 
become the primary sampling units. Thus, a two-stage sampling design appropriate for 
unequal-sized primary units must be used within each habitat stratum. Three alternative 
two-stage sampling designs, appropriate when primary units are of unequal sizes, are 
presented in this report. Two of these designs can substantially improve precision and 
accuracy of estimation when (1) the range in primary unit sizes is large (> four-fold), 
and (2) the correlation between primary unit totals and primary unit sizes is large (r>0.5). 
Improved accuracy of both designs is achieved through a significant reduction in 
first-stage variance as compared to designs that fail to account for the correlation 
between fish numbers and habitat unit sizes. 


Adoption of two-stage sampling designs based on unequal-sized primary units also 
gives important biological improvements. It is possible to quantitatively study the 
relationships between fish numbers and the sizes and qualities of habitat units. This kind 
of information is critical for understanding the dynamics of the abundance of fish in small 
streams. 


1 Introduction 
2 Basic Sampling Theory Concepts 
2 Expected Value, Variance, and Mean Square Error 
5 Stratification and Relative Efficiency 
8 Multistage Sampling Designs 
8 Traditional Two-Stage Sampling Design 
11 Alternative Two-Stage Sampling Designs 
13 Design A: Two-Stage SRS 
13 Design B: Ratio Estimation 
15 Design C: PPS Without Replacement 
16 Determining the Best Choice Among Alternative Designs 
18 Realistic Applications 
22 Discussion 
25 English Equivalents 
25 Literature Cited 
26 Appendix 1 
26 Computation of Selection Probabilities for the PPS Design 
28 Appendix 2 
28 Two-Stage Estimators for Alternative Designs 
30 Appendix 3 
30 Details of the Realistic Application of Alternative Designs 
31 Appendix 4 
31 Estimation of Total Biomass 


Introduction 


Fisheries biologists are often required to provide estimates of the total number of fish in 
small streams. The purposes for which these estimates are made will influence the 
required accuracy of estimates and the costs of obtaining estimates. For purposes of 
crude inventory work, estimates that are within + 50 percent of the true quantity (with 
95 percent probability) may be adequate. But for research purposes, estimates may have 
to be within + 10 percent of the true quantity (see Robson and Regier 1964). For 
example, evaluation of the impacts and cost-effectiveness of current efforts at rehabilita- 
tion or enhancement of anadromous salmonid populations will surely require estimates 
of fish abundance that are within + 10-20 percent of the true abundance. Have 
rehabilitation measures actually increased the number of adult fish spawning in 
streams? Have the benefits of rehabilitation exceeded the costs? Answers to such 
questions can be provided only through comparison of estimates of abundance before 
and after rehabilitation. If errors of estimation of abundance are large, then it will be 
difficult or impossible to detect changes in abundance and to evaluate rehabilitation 
programs. 


Development of schemes designed to reduce errors of estimation of fish abundance 
requires knowledge of both population estimators and sampling theory. Most fisheries 
biologists have received training in the application of population estimators, and this 
report assumes that readers are familiar with standard mark-recapture and removal 
method estimators (see Seber 1982 for a thorough review of population estimation 
methods; see Everhart and Youngs 1981, chapter 6, for a brief review). These population 
estimators are used widely and effectively in small streams. Knowledge of how to apply 
these population estimators, however, is inadequate for estimating fish abundance. 
Usually only a very small fraction of a stream can be sampled. How and where to select 
sample sections of stream must be decided before population estimators can be used 
within selected stream sections. 


Sampling theory concerns itself with methods that determine how and where to select 
samples, and with how those methods can influence the quality of inferences drawn 
from samples. The purpose of this report is to illustrate that basic sampling theory 
principles can be used to develop cost-effective programs for accurate estimation of fish 
abundance. In most cases, these principles are simple and intuitive; in other cases they 
are more complex, but still have strong intuitive appeal. Because few fisheries biologists 
have studied sampling theory, the intuitive basis of these principles will be stressed and 
simple numerical examples, rather than formal mathematical developments, will be 
used for illustrative purposes. The initial material on basic sampling theory concepts is 
provided for the benefit of readers with essentially no sampling theory background and 
may be skipped by some readers. Boldface type is used throughout this report to 
indicate the introduction of important sampling theory concepts or terms. 


Although the bulk of this report is devoted to examination of alternative sampling designs 
for estimation of the total number of fish, a fisheries biologist may often want to estimate 
the total biomass of fish. Appendix 4 is devoted to this topic. Readers interested in a 
more formal presentation of the alternative sampling designs presented in this report are 
referred to Hankin (1984) and to references cited in that paper. 


Basic Sampling 
Theory Concepts 


Expected Value, 
Variance, and Mean 
Square Error 


An estimator, when applied in practice, can result in many possible estimates. For 
example, if a certain number of fish were marked anc then released in a section of 
stream, many estimates of population size would be possible based on distinctive 
recapture samples. Due to chance, the fraction of marked fish in a given sample may 
differ from the true marked fraction in the total population; this will cause a population 
estimator to generate many different sample-based estimates of population size. The 
quality of an estimator is therefore assessed on the basis of its overall average 
performance. In the case of a mark-recapture estimator, this overall average perform- 
ance may be conceptualized by imagining (1) releasing M fish into a population of size 
Y, drawing a recapture sample of size C, and estimating population size based on the 
marked fraction in C; and (2) repeating this same experiment an infinite number of times. 
If the relative frequency of particular estimates of population size were then plotted 
against estimated population size, a characteristic distribution would result. This 
distribution is known as the sampling distribution of the population estimator and may 
be characterized, in part, by its mean and variance. In addition, however, one also 
characterizes the relationship of this distribution to the true population size that the 
estimator is designed to estimate. 


The quality of an estimator is judged on the basis of its sampling distribution by three 

principal criteria: 

1. Bias—the average departure of estimates from the true quantity being estimated. 

2. Variance—the average (Squared) variation of estimates from the average of all 
estimates. 

3. Mean square error—bias (Squared) plus variance. 


The statistical meaning of bias is much the same as its meaning in everyday language; 
variance and mean square error can also be defined in everyday terms that help clarify 
their meanings. Precision is the reciprocal of variance. Thus, if there is a great deal of 
variation among possible estimates (variance is large), then an estimator has low 
precision; if variance is small, then an estimator has high precision. But because 
variance measures only variation among the possible estimates (from the mean of all 
estimates), precision is not, by itself, a satisfactory measure of estimator performance. 
Mean square error is a measure of the accuracy of an estimator and is defined as the 
averaged (squared) variation between estimates and the true quantity being estimated. 
An estimator with small mean square error has high accuracy, whereas an estimator 
with large mean square error has low accuracy. Note that low accuracy can result from 
(1) high bias and high precision, (2) low bias and low precision, or (3) some intermediate 
combination of bias and precision. 


An intuitive understanding of the concepts of bias, variance, and mean square error can 
be conveyed through a bullseye analogy. Consider each of the four diagrams in figure 
1 as patterns of darts thrown by contestants in a contest at a local tavern. Figure 1A 
depicts the pattern of a highly skilled dart thrower. It is tightly packed (small variance) 
and centers about the bullseye (the quantity to be “estimated”). The pattern has (1) low 
bias, (2) high precision, and (3) high accuracy. Figure 1B shows the pattern of a rival 
dart thrower who is also highly skilled, but is using a new set of darts and has yet to 
adjust for the unfamiliar balance of the new darts. His pattern is highly precise but, 
because it is biased (that is, off target), it is less accurate than the pattern in figure 1A. 
Figures 1C and 1D can be thought of as dart patterns for these same two individuals 
after each has consumed two pitchers of beer. The first individual’s pattern (fig. 1C) 
remains unbiased (it still centers about the bullseye), but it is now extremely imprecise 


Figure 1.—The bullseye analogy. Various patterns of darts at a 
target: (A) high precision, low bias, high accuracy; (B) high precision, 
high bias, medium accuracy; (C) low precision, low bias, low 
accuracy; and (D) low precision, high bias, lowest accuracy. 


and, as a result, extremely inaccurate. Figure 1D shows that the second individual’s dart 
pattern retains its bias, is far less precise than in figure 1B, and is even less accurate 
than the first inebriated dart thrower’s pattern. The analogy between dart patterns and 
the sampling distribution of an estimator illustrates the concepts of bias, precision, and 
accuracy in an effective conceptual manner and helps prepare one for the quantitive 
treatment that follows. 


Most of sampling theory concerns itself with situations in which there are only a finite 
number of possible samples that can be drawn from a sampling universe of finite size. 
A sampling universe consists of the total number of units from which samples are drawn 
and the attributes of these units. Units may have many attributes. For example, if all 
pools in a stream constituted the units of a sampling universe, then units could have 
attributes such as number of fish, area, volume, and average and maximum depth. The 
objective of sampling is to estimate some collective attribute of the sampling universe on 
the basis of a small number of units (that is, from a sample). Examples of collective 
attributes of the pool sampling universe include total area of all pools, total number of 
fish in all pools, and mean number of fish per pool. 


Associated with each of the possible distinct samples of size n units that can be drawn 
from a sampling universe of size N units, there is an associated probability of drawing 
that sample. This probability will depend on the selection method used to draw the 
sample. In the simplest case, a sample is drawn by simple random sampling (SRS); by 
this selection method no unit can appear more than once in the sample (a without- 
replacement method), and all possible samples are equally likely. 


Sampling universe 


Unit number: 1 2 3 
Fish/unit: 2 5 i 
Mean number of fish/unit = 7 = wp 


Possible Units in Sample Sample mean 
sample sample values (¥;) (¥:- bw)? 


Totals 


Example 1.—Simple random sampling (SRS). Possible samples of 
size 2 selected from a sampling universe of size 4; units in samples; 


unit values in samples; sample means; and squared deviations of 
sample means from true mean, (¥; - y)?. 


Example 1 quantitatively illustrates the concepts of expected value and variance when 
samples of size n = 2 pools are drawn from a sampling universe of size N = 4 pools 
by SRS. The universe attribute of interest is the mean number of fish per pool (u = 7). 
There are six possible simple random samples and each has probability of one-sixth. 
The expected value of an estimator is denoted by E(6), where @ is the quantity or 
attribute of interest, and the “carat” or “hat” (over ©) indicates an estimator of @. 
Expected value is calculated as: 


a i aA 
E(O) = +9, P; 
t=1 
where: T = total number of possible samples; t = 1, 2,...,T; 


©, = the particular estimate of © generated from the t'" sample; and 
P, = probability of selecting the t' sample. 


In example 1, the estimator of the true mean number of fish per pool is denoted (as a 
notational convention) by y, rather than fi, and allows estimation of » from a sample 
(here of size n = 2 pools): 


where y; = the number of fish in pool i, and the summation is over those units that 
appear in a sample. 


Stratification and 
Relative Efficiency 


SRS is often used because it results in unbiased estimators. An unbiased estimator is 


an estimator such that E(0@) = 0; that is, the average value of all possible estimates is 
the true value of interest. In example 1: 


6 6 
a = > Vs) = (ES, = (Owe = y= mn, 
= ta 


The variance of an estimator, 0, of a true quantity, 0, is denoted by V(O) and is 
calculated as: 


For an unbiased estimator, E(0) = ©, so that V(6) = MSE(0). For a biased estimator, 
E(O) = © and BIAS(®) = E(®) - © (which may be positive or negative). Thus, the 
accuracy of a biased estimator must be measured by MSE(®); accuracy of an unbiased 


estimator can be measured simply by V(®). 


In example 1, the variance of y can be calculated as: 


6 
= ¥ (y,- 7)? (1/6) = (1/6):39 = 6.5. 


MSE(y) would be the same as V(y) for example 1 because E(y) = p. 


In many cases, the precision of estimators can be improved by stratification. Stratifica- 
tion consists of breaking a sampling universe into two or more groups of units (strata) 
and then drawing independent samples from each stratum. The objective of stratifica- 
tion is to group similar units in their own stratum so that variation within constructed 
strata is small compared to variation between strata. For example, if units were riffles 
and pools in a stream, then it would make sense to separate units into habitat type strata 
and to draw independent samples from within each habitat type stratum. This stratifica- 
tion would be effective because densities of fish would usually be different in pools than 
in riffles. Variation of densities of fish (on a per unit area or volume basis) would be 
smaller within the pool and riffle strata than variation in densities of fish between pools 
and riffles. 


Stratum | Stratum Il 
Units in stratum: 1,2 3,4 
Fish/unit: Ve! 


Possible Sample Sample Sample Mean 
sample units values (¥:) (¥; - w)? 


1 


2 
3 
4 


Totals 


Example 2.—Stratified random sampling. One possible stratification 
of the sampling universe presented in example 1. Possible stratified 
random samples of size 2 (1 unit selected from each stratum); units 
in samples; unit values in samples; sample means; and squared 
deviations of sample means from the true mean, (J; - 1). 


Example 2 shows one possible stratification of the sampling universe presented in 
example 1. Two strata have been formed: stratum | contains the two pools that have 
fewer fish; stratum II contains the two pools that have more fish. If one were to draw a 
single pool from each of the two strata, and then estimate the mean number of fish per 
pool, one would be using stratified random sampling. Stratified random sampling thus 
involves selection of units by simple random sampling within each constructed stratum. 
In contrast, in SRS (without stratification) units are selected by simple random sampling 
from all of the units in the sampling universe. 


When the number of units within each stratum is the same (stratum sizes are equal), the 
stratified estimator for the mean number of fish per pool is the same as that for SRS 
(without stratification): 


The nature of the selection method has reduced, however, the number of possible 
samples that could be drawn and eliminated the possibility of drawing samples that 
contained the units (1, 2), or (3,4). These samples made the largest contribution to 
variance in example 1. In example 2, there are only four possible distinct samples; each 
sample is equally likely and has probability of one-fourth. The expected value of the 
stratified estimator is also unbiased: 


il 4 
E(Ver) = yn ee Ve G1/4) (4) 28 a a 


And the variance of the stratified estimator is substantially less than for SRS (without 
stratification): : 


A a; 
V(Yst) = [Yr - E(Vs:)1° P,= (Vt - i Py 
=) al 


4 
= > (¥,- 7)2:(1/4) = (1/4):14.50 = 3.625. 


As example 3 illustrates, a poor statification can lead to a less precise estimator than 
SRS (without stratification). In example 3, the stratified estimator is again unbiased, but 
V(¥s) = 11.75 > V(¥s-s) = 6.50. The stratification in example 3 performed poorly 
because the most dissimilar units (that had 2 and 14 fish) were grouped in the same 
stratum. There is thus no assurance that stratification will improve the precision of 
estimators. Stratified random sampling will be more precise than SRS when the average 
variation within strata is less than the average variation between strata. 


Stratum | Stratum II 

Units in stratum: 1,4 PBs} 

Fish/unit: 2,14 Sh 7/ 

Possible Sample Sample Sample Mean 

sample units values (¥:) (¥; - 2)? 
1 ee 25 3-5 12.25 
2 Teco: 2 If 4.5 6.25 
3 4,2 14,5 9.5 6.25 
4 4,3 14,7 10.5 12.25 

Totals 28.0 37.00 


4 
EVs) = >¥: Pi = (1/4) 28 = 7 =p 
t=1 


4 
V(Vet) = = (¥:- w)? Py = (1/4)°37.00 = 9.25 
t=1 


Example 3.—An alternative stratification of the sampling universe 
presented in example 1. Possible stratified random samples of size 
2 (1 unit selected from each stratum); units in samples; unit values 
in samples; sample means; squared deviations of sample means 
from the true mean, (¥, - 1)*; expected value (E(¥.,)) and variance 
(V(Ye)) Of sample mean. 


Multistage 
Sampling Designs 


Traditional Two-Stage 
Sampling Design 


To compare the performances of alternative methods of selecting samples and 
estimating quantities of interest (alternative sampling designs), sampling theorists 
have devised a number of measures, one of which is relative efficiency (RE). For a 
fixed sample size, n, the relative efficiency of sampling design b compared to sampling 
design a is defined as: 


RE(b/a) = V,(8)/V,(8) ; 


where the subscripts denote designs a or b. The relative efficiency of the stratification 
design in example 2 compared to SRS without stratification (example 1) was about 
1.79 (6.50/3.625); the stratification design used in example 3 compared to example 1 
resulted in a relative efficiency of only 0.70 (6.50/9.25). Thus, the stratification used in 
example 2 was almost twice as efficient as SRS (example 1), whereas the stratification 
used in example 3 was only about two-thirds as efficient as SRS. The stratification 
used in example 2 was more efficient than SRS in the sense that, for the same 
sample size and the same amount of sample information, an estimator of nearly 
twice the precision was obtained. 


In the simple examples provided above, it was implicitly assumed that after a pool (unit) 
was selected, the number of fish in that pool could be counted without error. This 
assumption normally cannot be met when sampling small streams. Instead, within each 
selected unit some population estimator must be used to estimate the number of fish 
present. Thus, estimation of the total number of fish in a small stream requires (at least) 
two-stage sampling. Units are selected in the first stage, and fish numbers within 
selected units are estimated in the second stage. Two-stage sampling designs are the 
simplest kinds of multistage sampling designs. Estimation of the total number of fish in 
a moderate-sized or large stream might require three stages of sampling: first stage— 
selection of several long (10,000 m) sections of stream; second stage—selection of 
several short (100 m) sections within each long section selected at the first stage; and 
third stage—use of mark-recapture or removal method population estimators within 
each short section selected at the second stage. Errors of estimation arise at each stage 
of sampling, and the mathematical compiexity of estimators increases with the number 
of stages. 


This report contrasts the performances of four alternative two-stage sampling designs 
that could be used to estimate the total number of fish in small streams. Although three 
stages of sampling may be required for moderate-sized streams, restriction to just two 
stages of sampling will minimize mathematical complexity and will allow for a sound 
conceptual understanding of multistage sampling. Two stages of sampling may be 
entirely adequate for most small streams, or for substantial reaches of larger streams 
when interest lies solely within those reaches. 


The traditional approach to estimating the total number of fish in small streams illustrates 
the simplest type of two-stage sampling design. The total length of a stream is first 
divided into N sections of equal length, and a simple random sample of n sections is 
selected. Then, within each selected section (primary unit), some population estimator 
is used to estimate the number of fish present (the primary unit total) and to determine 
an estimated variance for this estimated total. Errors of estimation in this design arise 
from two sources: (1) extrapolation from the few primary units that are sampled to the 
entire stream length; and (2) errors of estimation of primary unit totals. The first source 


of error is measured by the average variation among (estimated) primary unit totals and 
is termed first-stage variance; the second source of error is measured through 
(estimated) variances of estimated primary unit totals (based on population estimator 
formulas) and is termed second-stage variance. 


Formulas appropriate when using this traditional design, termed a two-stage design with 
equal-sized primary units, are (Bohlin 1981; Cochran 1977, p. 300-303): 


7 oN (1) 
y v\2 

V(Y) = ao ae - NS vCy) ; and (2) 
Z 2 

wy = SA) aoe sum; (3) 


bee 2 n. 
where: Y = XY,/N, and Y = SY//n. 


The first term in equation (2) measures first-stage error (variance); the second term 
measures second-stage error (variance). Equation (2) gives the true variance of the 
sampling distribution for Y and has a single, unique value. In contrast, equation (3) is an 


estimator for V(Y) that can take on many possible sample-based values depending on 
the particular samples that are selected. 


Asimple numerical example will help illustrate the nature of sample-based calculations 
for this traditional design. Suppose that a small tributary stream is sampled and that it 
has a total length of 10,000 m. Five 100-m sections are selected by SRS; within each 
selected section the two-pass Seber-Le Cren removal method estimator (based on 
electrofishing; see Everhart and Youngs 1981, p. 107; appendix 3) is used to estimate 
the number of fish present. Thus, N = 10,000/100 = 100, andn = 5. Suppose that the 
following estimates are obtained for the sample sections: 


Section number Population estimate (Y,) Estimated variance (V(Y,)) 
1 150 420 
2 350 980 
3 550 1,540 
4 200 560 
5 250 700 
Totals 1,500 4,200 


Estimated variances are consistent with a fairly low electrofishing capture probability of 
0.50. The estimated total number of fish in the stream would be: 


n 
ry, = as 1,500 = 30,000. 


= N 
ar 5 


10 


The estimated variance of this estimated total would be calculated using equation (3): 


4.2 2 
_ 100 (100-5) * §%1= 30)" 409 
5 (5-1) 5 


4.75 x 10’ + 0.0084 x 10” = 4.758 x 10’. 


Assuming normality of the sampling distribution for Y, the 95-percent confidence interval 
for the total number of fish in the stream would be constructed as: 


s ae 
Y + tin1),0.95 V V(Y) . 


This would give 30,000 + 2.78°6,898, or 30,000 + 19,176. This is hardly a satisfactory 
confidence interval for most purposes, but it is based on having sampled only 5 percent 
of the total stream length. 


The important thing here is that virtually all the estimated variance arises from 
variation among estimated primary unit totals (the first term in equation (3)). The 
contribution from the second term is negligible, even though electrofishing capture 
probability is poor and equals 0.50. Had fish in each primary unit been enumerated 
rather than estimated, there would have been no variance contribution from the second 
stage of sampling. Estimated variance of the estimated total would have been little 
affected in this case. The contribution from the first term in equation (3) would remain 
exactly as it is (assuming that estimated primary unit totals were equal to the true 
primary unit totals). 


This simple example illustrates that errors of extrapolation are likely to be far greater 
than errors of estimation within selected stream sections. This fact has been generally 
unappreciated by fishery biologists who have been preoccupied with electrofishing 
capture probability, or with violations of mark-recapture assumptions (second-stage 
considerations). The importance of the sampling design itself, as it influences the 
magnitude of first-stage variance, has been largely ignored. 


The traditional two-stage design with equal-sized primary units has at least the following 

biological and statistical flaws: 

1. Selected stream sections, when of equal length, will inevitably include mixtures of 
habitat types. 

2. Placement of block nets to delimit stream sections may displace fish from the section 
to be sampled. One may often be estimating the number of fish remaining in the 
section, rather than the original number present. 

3. One or both “ends” of selected sections may fall midway in a deep pool where it may 
be impossible to set block nets. If stream sections were expanded in length, or moved 
upstream or downstream, in response to this dilemma, a purposive decision would 


Alternative 
Two-Stage 
Sampling Designs 


have been made. This would destroy both the intent and the statistical validity of the 
sampling design itself. 

4. The traditional two-stage sampling design generates large first-stage variance and 
offers only one way to reduce variance of an estimated total. The number of sampled 
sections must be increased to reduce estimator variance, which will significantly 
increase the total cost of obtaining estimates. 


The large variation among primary unit totals in the traditional two-stage design results 
because sections, while of equal length, are usually not of equal habitat quality. One 
section may include primarily riffle habitat, whereas another section may include 
primarily pool habitat. The total number of fish per primary unit is thus a highly variable 
and unstable quantity across units because densities of fish per unit area (or per unit 
volume) vary considerably among habitat types. This makes the squared differences 
between Y, and Y large (the first term in equation (2)). 


Stratification can be effectively used to help reduce first-stage variance. If a stream were 
mapped into habitat units (entire pools, entire riffles) and units stratified by habitat type, 
then samples could be drawn independently from each habitat type stratum. Variation 
among the mean number of fish per unit area (or per unit volume) should be much 
smaller within each habitat stratum than variation in mean densities of fish between 
habitat strata. Given estimates of the total number of fish within each stratum (Y,,), an 
estimate for the total number of fish in the entire stream can be calculated simply by 
summing stratum-specific estimates across all strata: 


A 


ie 


Ih MS 


aN eS hi Meee ale 
h=1 


An estimate for the variance of Y can be calculated by summing stratum-specific 
variance estimates across all strata: 


Simple addition of stratum-specific estimates is justified by the independence of 
sampling in strata. 


If stratification is used, however, then habitat units within each habitat stratum will not 
be contiguous (for example, a riffle or run would separate two pools within the pool 
stratum). Also, the habitat units themselves, which are intuitively appealing primary 
units, will be of variable sizes. If the sizes of natural habitat units are allowed to dictate 
the sizes of the primary sampling units, then primary units will be of unequal sizes (in 
contrast to the traditional two-stage design) and the complexity of appropriate sampling 
designs will be increased. But, allowing primary units to vary in size according to the 
sizes of the natural habitat units has at least the following advantages: 

1. Habitat types will not be mixed among or within sampled primary units because of 
stratification and because the natural habitat units are equivalent to the primary 
sampling units. 

2. Placement of block nets to delimit primary units will be less likely to displace fish from 
primary units because fish will tend to seek shelter within their natural habitat units. 


11 


3. Estimated numbers of fish in sampled habitat units can be related to the sizes and 
types of habitat units. 

4. Alternative two-stage sampling designs, based on primary units of unequal sizes, can 
be used to dramatically increase accuracy of estimation of the total number of fish. 


The remainder of this report is devoted to a consideration of the costs and benefits of 
three alternative two-stage sampling designs that are appropriate when primary units 
are of unequal sizes. It will be assumed that a stream has been mapped and that habitat 
units have been grouped into two or more habitat type strata. The three alternative 
designs can be independently applied within strata, and estimated stratum totals 
and variances can be added across strata to generate estimates for the entire stream. 
Therefore, it is only necessary to consider application of these alternative designs within 
a particular stratum; for illustrative purposes, applications are within the pool stratum. 
Alternative designs could be applied with similar results in other habitat strata. 


The three alternative two-stage designs that will be considered may be classified by their 
selection method and by their use (or lack of use) of an auxiliary variable. An auxiliary 
variable is an attribute of a primary unit that can be inexpensively and easily measured; 

it can be used to improve precision of estimation of the particular attribute of interest (the 
target attribute). For small streams, a target attribute is the total number of fish in the 
pool stratum, and an auxiliary variable is pool size (area or volume). When the numbers 
of fish present in pools are positively correlated with pool sizes, use of pool size as an 
auxiliary variable can dramatically improve precision of estimators. 


Two of the alternative designs rely on selection of primary units (pools) by simple random 
sampling. For one of these designs (denoted two-stage SRS) no auxiliary variable is 
used; for the other design (ratio estimation) the auxiliary variable pool size (area) is used 
to improve precision of estimators. For the third design, the auxiliary variable pool size 
is used to calculate the probability of selecting pools. This third method is called 
selection of primary units with probabilities proportional to their size (PPS). The relative 
performances of these three alternative designs will first be illustrated using simple, 
single-stage examples because most of the errors of estimation come from the first 
stage of sampling (second-stage error is small). That is, fish numbers within pools will 
initially be enumerated rather than estimated so that there will be no second-stage error. 
Later, the relative performances of the three alternative designs will be contrasted in a 
more realistic, two-stage setting. 


Alternative sampling designs will be applied to the following small sampling universe of 
four pools: 


Pool number (i) Pool size (Mj) Number of fish (Yj) 
1 2 4 
2 3 36 
3 5 44 
4 10 116 
Totals 20 200. 


In each case, samples of size n = 2 pools will be drawn from this sampling universe of 
size N = 4 pools. The objective will be to estimate the total number of fish (Y = 200) in 
all four pools. Relevant collective attributes of this sampling universe include: 

Mo = =M, = 20 (total size of all pools); and R = Y/My = 200/20 = 10 (average 
number of fish per unit of pool size). “Size” could be area (m?) or volume (m*). 


Design A: Two-Stage SRS According to design A, a sample is drawn from the sampling universe by SRS and the 


Design B: Ratio 
Estimation 


total number of fish in all pools is estimated as: 


The six possible SRS samples, estimated totals for each sample (Y,), and squared 
deviations between estimated totals and the true total (Y, - Y)* are as follows: 


Sample units WA (Vaoie 
1,2 80 14,400 

1,3 96 10,816 

1,4 240 1,600 

2,3 160 1,600 

2.4 304 10,816 

34 320 14,400 
Totals 1,200 53,632 


Each of these samples is equally likely (because selection is by SRS), so the expected 
value of Y,,, may be calculated as: 


6 
E(Y¥=.) ==> Y, Pic—.(1/6):1,,200' = 200 = Y. 


This design results in an unbiased estimator (E(Y.,,) = Y), so variance of the estimated 
total (here equivalent to mean square error) can be calculated as: 


A 


(Y,- Y)2 P, = (1/6):53,632 = 8,939. 


For design B, primary units are selected by SRS, but a measure of the size of selected 
units (an auxiliary variable) is incorporated into estimators. The total number of fish in 
all pools is now estimated as: 


nein 


Viet => Mo LY ,/=M; = Mo R A 


ap nase 
where R = XY;/=M, . 


Ratio estimation has a simple intuitive basis. Based on the sample, one obtains an 
estimate of the true number of fish per unit of pool size (R). An estimator for the total 
number of fish in all pools would be the total size of all pools (Mo) times the estimated 
number of fish per unit of pool size (R, usually called the sample ratio). The six possible 
SRS samples, estimated sample ratios (R,) and totals (Y,), and squared deviations 
between estimated totals and the true total ((Y, - Y)?) are: 


aA a 


Sample units R, Y; (Y,- Y)? 

Usz 8 160 1,600 
13 6.86 137 3,951 
1,4 10 200 0 
2,3 10 200 0 
2.4 11.69 234 1,145 
3,4 10.67 213 178 

Totals 1,144 6,874 


The expected value of the ratio estimator for the total number of fish in all four pools 
would be: 


ll Mo 


E(Y,at) = = Y;P, = (1/6)-1,144 = 190.7 ¥ 200. 
1 


cr 


Thus, the ratio estimator is biased: 
BIAS(Y,a1) = E(Yrat) - Y = 190.7 - 200 = -9.3. 


For that reason, the appropriate measure of accuracy is mean square error rather than 
variance: 


6 
MSE(Y,at) = > (Y,- Y)? P; = (1/6)°6,874 = 1,146. 
t=1 


Variance of the ratio estimator could be calculated as: 
V(Y at) = DIBA iz E(Y¥ai- PE 


but it is more easily calculated as the difference between mean square error and 
squared bias: 


V(Yrat) = MSE(Y;a1) - [BIAS(Y,4)]? = 1,146 - (-9.3)? = 1059.5. 


Although the ratio estimator is slightly biased, variance is sufficiently small so that the 
accuracy of design B is considerably greater than for design A. This improvement comes 
from use of the auxiliary variable pool size (M,), and from the positive correlation 
between fish numbers and pool sizes. Estimated totals (Y,,:) ranged from only 137 to 
234 among all samples (as compared to a range of from 80 - 320 for design A), and 
variance was dramatically reduced as a result. This contrast in performance between 
designs A and B provides a clear example of when a biased estimator might be preferred 
over an unbiased estimator. 


Design C: PPS Without 
Replacement 


For design C, primary units are selected by PPS without replacement. Although the 
same possible samples result, by this method of selection all possible samples are not 
equally likely (as they are using SRS selection). Larger pools are more likely to be 
included in samples than are smaller pools. There are many possible ways to select 
samples by PPS without replacement, but all selection methods produce two kinds of 
probability assignments: 

a, = probability that unit i is in a sample of size n; and 

w™ = probability that units i and j are in a sample of size n. 
When PPS without replacement is used to select samples, the total number of fish in all 
pools is estimated as: 


. n 
Neos = >» Y(/7; 6 

For illustrative purposes, PPS selection probabilities will be based on a method in which 
successive units are selected with probabilities proportional to the sizes of the remaining 
units (appendix 1 contains a summary of computations for this method). First-order 
inclusion probabilities (the 7's) for the pool sampling universe, for samples of size 


n = 2 pools, are: 


Unit number Size (M,) TT; 
1 2 0.2510 
2 3 .3666 
3 5 5718 
4 10 .8104 


The largest pool is more than three times as likely to be in the sample (74 = 0.8104) 
than the smallest pool (7, = 0.2510). 


When n = 2, then the second-order inclusion probabilities (the 7s) are the same 
as the probabilities of individual samples. The six possible PPS without replacement 

samples, estimated total (Y,), squared deviations between estimated totals and the true 
total ((Y, - Y)*), and sample probabilities (P, = 7) for the pool sampling universe are: 


A 


Sample units Y; (Y, - Y)? P, = 7; 
2 114.20 7,362 0.0343 
1,3 92.89 11,473 0611 
1,4 159.08 1,675 .1556 
2,3 175.15 617 .0941 
2,4 241.34 1,709 .2382 
3,4 220.09 403 .4167 


15 


Determining the Best 
Choice Among 
Alternative Designs 


16 


Note that 2P, = 1 (as in SRS) and that sample probabilities vary and reflect sizes of 
pools. The expected value of the PPS without replacement estimator can be calculated 
as: 


6 
E(Ypos) = = ¥; P; = 114.20-0.0343 + 92.89:0.0611 + 
t=1 
_.. + 220.09-0.4167 = 200.0 = Y. 


Because Vee is unbiased, variance of the estimator can be calculated as: 


= 7,362:0.0343 + 11,473:0.0611 + ... + 403:0.4167 = 1,847. 


Variance for the PPS without replacement estimator [V(Y ops) = 1,847] is substantially 
less than for the two-stage SRS estimator [V(Y.,,) = 8,939] and the PPS estimator is 
also unbiased. But because accuracy of the PPS estimator is slightly less than for the 
ratio estimator [MSE(Y,.:) = 1,146], it would be difficult to choose between designs B 
and C on the basis of accuracy alone. 


The preceding applications of the three alternative designs allowed a comparison of the 
accuracies and degrees of bias of the three estimators. Choice among the alternative 
designs must also include, however, a consideration of the total costs of obtaining 
estimates, usually termed total survey costs. Just as one can compare the relative 
efficiencies of alternative designs, one can also compare the relative cost (RC) of one 
design to another. The relative cost of design b compared to design a is defined (for a 
fixed sample size, n) as: 


RC(b/a) = C,/C, ; 
where C, and C, are the total survey costs for designs a and b. 


Total survey costs for estimating the total number of fish in small streams can be 
separated into two distinct categories: (1) costs that are independent of the particular 
selected units; and (2) costs that directly depend on the particular selected units. The 
first category (fixed costs) includes housing, per diem and travel (to and from the study 
site, and between selected units), and time spent setting up and taking down block nets 
if electrofishing is used in selected units. The second category includes time actually 
spent in selected units to estimate fish numbers. 


The average total size of n units selected by PPS without replacement will be greater 
than the average total size of n units selected by SRS because the PPS design assigns 
higher selection probabilities to larger units. Because it takes longer to sample a large 
pool than a small pool, it will cost more to sample the same number of pools when they 
are selected by PPS (design C) than when they are selected by SRS (designs A and B). 


For the traditional, equal-sized primary unit design, total costs of a stream survey are 
probably roughly split in half between the two cost categories. It seems reasonable to 
assume that this would also be the case if unequal-sized primary units were selected by 
SRS. This assumption leads to a simple cost function of the form: 


Ca6 = 10:9 Cos + @ XG. 5 


expected (average) total size of n units selected by SRS; and 
cost per unit of size, such that a X,,, = 0.5 Cy. 


where: Xz,5 
Q 


A comparable cost function for the PPS design would be: 
Caps e010) Corset & Xppsit 


where X,,, = expected (average) total size of n units selected by PPS without 
replacement. 


The expected (average) total size of n units selected by SRS is: 


N 
Xorg = N=M,/N: 


whereas that for n units selected by PPS without replacement is: 


N 


xX = =Mi7; : 


Pps 
Expected total sizes of selected units are used for comparisons because they reflect the 
overall average behavior of the design. Actual total sizes of selected units would depend 
on the particular samples selected. 


The RC of the PPS design (design C) compared to the SRS designs (A and B) can be 
determined by normalizing the total cost of the SRS designs. That is, letC,,, = 1. Then, 
a = 0.5/X,,, and the expected total cost of the PPS design would be C,,, = 0.5 

+ aXpp5. RC(PPS/SRS) would then equal C,,;/C;,;- This procedure can be illustrated 
using the same small sampling universe of four pools. The expected total size of two of 
these four units selected by SRS was X,,, = n XM,/N = 2:20/4 = 10; so that a = 
0.5/10 = 0.05. The expected total size of two units selected by PPS without replace- 
ment was: 


4 
Xpps = & Min; = 20.2510 + 30.3666 + 5:0.5718 + 10°0.8104 = 12.5648 . 
The expected total cost of the PPS design was: 
Cops = 0.5 + 0.05:12.5648 = 1.128; 


and RC(PPS/SRS) = C,,,/C.,, = 1.128/1 = 1.128 = C,,,. Thus, normalizing total 


survey costs made C,,,, = RC(PPS/SRS). 


pps: 


17 


Realistic Applications 


18 


Finally, net relative efficiency (NRE) is a measure of the cost-effectiveness of 
alternative designs and is defined (for a fixed sample size, n) as: 


NRE(b/a) = RE(b/a)/RC(b/a) = V,(6)C,/V,(8)C, - 


For the small sampling universe used for examples, the net relative efficiency of ratio 
estimation as compared to the two-stage SRS design is equivalent to the relative 
efficiency because total survey costs are the same. The net relative efficiency of the 
PPS design compared to the SRS design, however, would be NRE(PPS/SRS) = 
8939-1/1847-1.128 = 4.29; and the net relative efficiency of the PPS design compared 
to the ratio estimation design would be NRE(PPS/ratio) = 1059-1/1847-1.128 = 
0.508. Thus, in this case, the PPS design was about four times as cost-effective as the 
two-stage SRS design, but only about half as cost-effective as the ratio estimation 
design. Choice of sampling design should be based on net relative efficiency because 
it balances possible improvements in efficiency (that is, reductions in variance or mean 
square error) against possible increases in total survey costs. On the basis of net relative 
efficiency, the ratio estimation design would be the design of choice for the small 
sampling universe of four pools (but see “Discussion’). 


The three alternative two-stage sampling designs were applied in their full two-stage 
forms to a realistic, large sampling universe (N = 50) constructed on the basis of data 
collected from Knowles Creek, a small-stream tributary to the Siuslaw River in Oregon. 
A two-pass electrofishing/removal method estimator was assumed to have been used 
to estimate the number of fisn in selected primary units and electrofishing capture 
probability was set to 0.50. Relevant formulas for the full two-stage designs are 
presented in appendix 2, and details of their application are presented in appendix 3. 


Figure 2 snows a plot of the number of fish (yearling coho salmon, Oncorhynchus 
kisutch) in Knowles Creek pools against the sizes (areas, in m*) of pools. Although there 
is a substantial amount of variation in fish numbers for a given pool size, there is a 
significant increasing trend of fish numbers with pool size. Pool size accounts for roughly 
50 percent of the variation in fish numbers (R* = 0.58 for a linear regression of fish 
number against pool size). 


Figure 3 shows sampling variances (or mean square error) for two-stage SRS, ratio 
estimation, and PPS without replacement sampling designs as a function of sample 
size. Sampling variances of alternative designs follow a consistent pattern over all 
sample sizes: variance of the SRS design is greatest; that of ratio estimation is 
noticeably less and is, essentially, a constant fraction of variance for the SRS design; 
and variance of the PPS design is substantially less than for ratio estimation. The 
magnitude of variance for the PPS design, compared to the other designs, generally 
improves with sample size. 


1000 [Ta a a | 
| 
800 r=0.760; n=50 pools | 
e 
8 
a 
= 600+ 
a | 
pe 
S| 
5 400+ s 
2 | 
€ | 
s 
Sot l| 
fa) e e | 
20 r : His. 
Peecrrier ce ce | 
e | 
| él exe 8 5% ee ~ * 
2 fe) 100 200 300 400 500 600 


Pool size (m*) 
Figure 2.—Total number of fish plotted against pool size for the 
sampling universe used for realistic applications of alternative 
two-stage sampling designs. Based on data collected from Knowles 
Creek, Oregon, by the USDA Forest Service. 


107 
SRS 
106+ 
E RATIO 
Ze |b 
Saou: 
108 
[ PPSWOR 
| n 
1, 10 20 30 40 50 


Sample size 


Figure 3.—Sampling variances (Y(Y)) for two-stage SRS (SRS), 
ratio estimation (RATIO) and PPS without replacement (PPSWOR) 
sampling designs plotted against sample size for the Knowles Creek 
sampling universe. Values plotted for the ratio estimation design are 
actually MSE(Y) because the design is biased. 


19 


20 


Net relative efficiency 


fe) 10 20 30 40 50 


Sample size 


Figure 4.—Net relative efficiencies of PPS without replacement 
(PPSWOR) and ratio estimation (RATIO) designs (as compared to 
the two-stage SRS design) plotted against sample size for the 
Knowles Creek sampling universe. 


Figure 4 shows the net relative efficiencies of the ratio estimation and PPS designs 
compared to the SRS design as a function of sample size. Net relative efficiency of ratio 
estimation is about 2.7 and is essentially independent of sample size (for n < 40). For 
the PPS design, net relative efficiency increases with sample size until it exceeds 12 
when n = 35. Based on the criterion of cost-effectiveness, both ratio estimation and 
PPS without replacement offer substantial improvements over the SRS design. 
Depending on sample size, the PPS without replacement design could be more than 10 
times as cost-effective as the SRS design. 


w—Y 


1st/SRS 


~ 


7} 
o 
(S) 
= 
A 
S F 1st/PPSWOR ne 
2 se 
rs] 5 
3 10 (ee 
5 c 
S L 2d/PPSWOR 
° r -2d/SRS 
wn” - 
ne} 
¢€ 
a 
ue L 
2 
= 

104L 

E 


3 [Scapa sare a MBS eb ya Sc ee 
10 
0) 10 20 30 40 SO 
Sample size 


Figure 5.—First- and second-stage sampling variances for the 
two-stage SRS (SRS) and PPS without replacement (PPSWOR) 
designs plotted against sample size for the Knowles Creek 
sampling universe. 


Figure 5 shows that the striking performance of the PPS design was achieved entirely 
through dramatic and rapid reduction of first-stage variance with increasing sample size. 
First-stage variance was almost always at least an order of magnitude larger than 
second-stage variance for the SRS design. In contrast, first-stage variance rapidly 
decreased for the PPS design until (for n => 25) it was actually less than second-stage 
variance. The PPS design effectively addressed the usually large first-stage variance 
problem that is associated with the traditional two-stage design having equal-sized 
primary units. 


21 


Discussion 


22 


The simple single-stage examples and the more realistic two-stage examples presented 
here illustrate the importance of sampling design in determining errors of estimation of 
the total number of fish in small streams. Although fishery biologists have paid a great 
deal of attention to errors of estimation within selected sample sections (second-stage 
variance), they have not devoted much attention to errors of extrapolation from the small 
number of sections sampled to an entire stream (first-stage variance). Most fishery 
biologists receive extensive training in population estimation and this training can be 
used to help reduce second-stage variance. Reduction of first-stage variance requires 
knowledge of sampling theory, however, and few fishery biologists receive formal 
training in this area. 


The traditional two-stage sampling design with equal-sized primary units (equal-length 
sections of stream constitute the primary sampling units) results in large first-stage 
variance. Although primary units are of equal size, they may vary greatly in habitat 
quality and in the number of fish that they can support. The only way to reduce this large 
first-stage variance when using the traditional design is to increase the number of 
sampled sections. This significantly increases total survey costs. 


Stratification of stream habitat into pools, riffles, and other habitat types, coupled with 
independent sampling in each constructed stratum, can help reduce first-stage variance 
by limiting variation in habitat quality among sampled sections. When stratification is 
employed, however, the primary units become equivalent to the natural habitat units and 
are then of unequal sizes. The unequal sizes of primary units increase the complexity 
of applicable sampling designs, but also offer a wide choice of alternative designs. Of 
three such alternative designs considered here, both ratio estimation and PPS without 
replacement appear to have considerable promise for improving the precision and 
accuracy of estimates of the total number of fish in small streams. Both designs take 
advantage of the usually strong positive correlation between fish numbers and habitat 
unit sizes; greater numbers of fish are usually found in large pools than in small pools. 


Evaluation of the relative performances of alternative designs required their application 
to specific sampling universes and calculation of the expected (average) behavior of 
estimators. Because total survey costs varied among designs (as a result of primary unit 
selection method), it was necessary to calculate total survey costs in addition to 
estimator sampling variance (or mean square error) in order to calculate a measure of 
the cost-effectiveness of alternative designs. Choice of design should be based on 
cost-effectiveness (net relative efficiency) rather than on accuracy alone. 


The simple cost functions used in this report could be made more realistic in two 
respects. First, time spent electrofishing a selected unit may not be linearly related to 
pool (or riffle) area, but may instead increase as some power of pool area. This 
possibility was examined by Hankin (1984) but had little impact on relative performances 
of alternative designs. Second, it may be necessary to add an additional term to cost 
functions to account for the different stream mapping requirements of the alternative 
designs. When primary units are of unequal sizes, the three alternative sampling 
designs (as presented in this report) all require a map of the locations of all primary units. 
This is necessary so that the sampling universe can be specified. Without a specified 
sampling universe, it is impossible to compare the expected performances of alternative 
designs. The quality of maps required for the different designs may vary substantially, 
however. For the two-stage SRS design, only the location of the primary units is required 
and there is no need to measure primary unit sizes. In contrast, both ratio estimation 


and PPS without replacement designs require some measure of primary unit sizes in 
addition to locations. Increased mapping expenses for the ratio estimation and PPS 
designs would increase the relative costs for these designs compared to the SRS 
design. The calculations of cost-effectiveness presented in this report should therefore 
be viewed with some skepticism. 


Ratio estimation requires (1) the sizes of primary units that appear in the sample of n 
units, and (2) the total size of all primary units. In a practical sense, the latter requirement 
means that the sizes of all primary units have to be determined. The PPS design 
requires either accurate measurements of all primary unit sizes, or some less accurate 
measurements of primary unit sizes that are highly correlated with the true primary unit 
sizes. For example, if a reach of stream were of fairly uniform width, then length of pool 
would be highly correlated with pool area. Alternatively, visual (“eye”) estimates of 
primary unit sizes could be used to assign PPS selection probabilities if estimates were 
highly correlated with the true sizes. The marginal cost of obtaining these simpler 
measurements of primary unit sizes would probably be small compared to costs of 
locating primary units (which must be done for all designs). 


There is an additional way to reduce mapping costs for both SRS and ratio estimation 
designs that was not considered in this report but that has substantial merit. Instead of 
selecting primary units by SRS, one could instead select units by systematic sampling. 
In systematic sampling, one chooses a random start from the integers 1 through K, and 
then selects every K"" unit. In the context of a stream survey, 1/K could be the desired 
sampling intensity (fraction of habitat units that are sampled). No map is needed for 
systematic sampling. A field crew could proceed upstream or downstream and pick, say, 
every 10th pool (or riffle) for sampling. When the survey was complete, the total number 
of pools (or riffles) would be known and, if each sampled unit were measured, the total 
size of all habitat units could be estimated. 


The systematic sampling approach would allow valid application of the two-stage SRS 
design and (with some minor modifications) the ratio estimation design. The SRS design 
would still be unbiased, but systematic sampling would rule out unbiased estimation of 
variance. For that reason, the systematic sampling approach was not formally presented 
in this report. Nevertheless, systematic sampling could prove practical and cost- 
effective. It will rarely perform worse than SRS, and it will usually perform better. 
Formulas presented for the two-stage SRS design could also be used if units were 
selected by systematic sampling; they would tend to overestimate the true sampling 
variance and would therefore provide conservative estimates of sampling variance. 


In many situations, fishery biologists may not have any preliminary data with which to 
construct plausible sampling universes and thereby judge the probable performances of 
alternative designs prior to field work. In such instances, it seems most reasonable to 
first sample using the SRS design and then to compare estimated variances of the SRS 
design (equation 3, appendix 2) with estimated mean square error of the ratio estimation 
design (equation 6, appendix 2). Because the sample-based estimator of mean square 
error for ratio estimation requires only measurements of those units that are selected,the 
cost of making this comparison would be small. If sample sizes exceeded about 12 (see 
appendix 2; table 1), then the comparison would be valid. If it showed that mean square 
error for ratio estimation was far less than for the two-stage SRS design, then it would 
be worth pursuing the ratio estimation or PPS designs (with their increased mapping 
expenses) in future survey work. Use of the PPS design without preliminary data is 


23 


24 


Table 1—Guidelines for use, mapping needs, probable relative survey costs 
(compared to the SRS design), and restrictions and/or requirements for alterna- 
tive two-stage sampling designs with unequal-sized primary units 


Design 


Conditions when 
design should be 
effective 


Weak correlation 


Mapping 
requirements 


Location of 


Total survey 
costs (for fixed 
sample size) 


Restrictions 
and/or 
requirements 


Can be used for 


Base for comparison 


Two-stage 
SRS 


Ratio 
estimation 


PPS without 
replacement 


between fish 
numbers and 
primary unit 
sizes (r < 0.4) 
Small range 
(< 4-fold) in 
primary unit 
sizes 


Strong correlation 
between primary 
unit sizes and fish 
numbers (r > 0.5) 

Substantial range 
(> 4-fold) in 
primary unit sizes 


Moderate correla- 
tion between fish 
numbers and 
primary unit sizes 
(r > 0.4) 

Substantial range 
(> 4-fold) in 
primary unit sizes 


all primary 
units 


Stratification 
by primary 
unit sizes 
may improve 
performance 


Location of all 
primary units 


Sizes of all 
primary units 
in sample 

Total size of 
all units 


Location of all 
primary units 


Suitable mea- 
sure of sizes 
of all primary 
units; may 
be visual 
estimates 


of alternative 
designs 


Greater: increased 
mapping costs 


Greater: increased 
mapping costs 


Average size of units 
in sample will be 
larger than for SRS 


any size sam- 
pling universe 
and any sample 
size 

Always use for 
preliminary 
surveys 


Can be used for 
any size 
sampling 
universe 

Sample size 
should exceed 
12 


Sampling universe 
should be no 
larger than 
N = 50to 100 


Computer required 
for all 
computations 

Biometrician 
should be 
consulted prior 
to use 


definitely not recommended. Table 1 summarizes those conditions under which 
alternative designs should be effective, their probable costs, and their requirements 
and/or restrictions. 


Regardless of the choices made among alternative sampling designs, the practice of 
allowing natural habitat units to dictate the primary sampling units seems wise. 
Displacement of fish from sampled sections due to setting block nets, mixture of habitat 
types within sampled sections, and the impossibility of placing block nets in deep pools 
should all be minimized or eliminated. Analysis of estimates generated from distinctive 
habitat unit types of varying sizes can allow one to draw important conclusions regarding 


English Equivalents 


Literature Cited 


relationships among habitat unit sizes and types and fish abundance. These conclusions 
are difficult, if not impossible, to draw when primary units are of equal sizes. Adoption 
of unequal-sized primary unit sampling designs should improve accuracy of estimation 
of fish numbers, but more importantly it should improve our understanding of the 
dynamics of fish populations in small streams. 


1 meter (m) = 39.37 inches or 3.28 feet 
1 square meter (mz) = 10.7639 square feet 


Bohlin, T. Methods of estimating total stock, smolt output and survival of salmonids 
using electrofishing. Institute of Freshwater Research Drottingholm Report. 59: 
5-14; 1981. 


Chao, M.T. A general purpose unequal probability sampling plan. Biometrica. 69: 
653-656; 1982. 


Cochran, W.G. Sampling techniques. New York: Wiley; 1977. 428 p. 


Everhart, H.; Youngs, W.D. Principles of fishery science. Ithaca, NY: Cornell University 
Press; 1981. 349 p. 


Goodman, L.A. On the exact variance of products. Journal of the American Statistical 
Association. 57: 54-60; 1960. 


Hankin, D.G. Multistage sampling designs in fisheries research: applications in small 
streams. Canadian Journal of Fisheries and Aquatic Sciences. 41: 1575-1591; 1984. 


Jessen, R.J. Statistical survey techniques. New York: Wiley; 1978. 520 p. 
Raj, Des. Sampling theory. New York: McGraw-Hill; 1968. 302 p. 


Robson, D.S.; Regier, H.A. Sample size in Petersen mark-recapture experiments. 
Transactions of the American Fisheries Society. 93: 215-226; 1964. 


Seber, G.A.F. The estimation of animal abundance. New York: MacMillan; 1982. 654 p. 


25 


Appendix 1 


Computation of Selection 
Probabilities for the 
PPS Design 


26 


All PPS selection methods require calculation of (at least) the following inclusion 
probabilities: 


a = probability that unit i is the sample (i = 1, 2,...,N); and 
™, = probability that units i and j are in the sample (i = }). 


When samples are of size n = 2, then 7 is equivalent to P,. where P;, is the probability 
of drawing the t™ sample (that particular sample that contains the units i andj). Inclusion 
probabilities will depend on three things: (1) sample size, n; (2) the sizes of units in the 
sampling universe (or some other measurement of unit size that is used to assign 
selection probabilities to units); and (3) the particular PPS without replacement selection 
method that is used. Selection probabilities for the case of samplingn = 2fromN = 4, 
used for illustrative purposes in this report, were calculated using a selection method by 
which units are selected with probabilities proportional to the sizes of the remaining units. 


Calculations for the single stage example in design C were as follows: 
1. Calculate p; = M,/Mo. These are the probabilities that unit i will be selected on the first 
N 


draw: My = =M.. 

2. Calculate p(jli) = p,/(1-p;). These are the conditional probabilities of drawing the in 
unit on the second draw given that unit i was selected on the first draw. These 
conditional probabilities are equivalent to those probabilities that would be calculated 
on the basis of the sizes of the remaining units at the time of the second draw. The 
table below lists the p(jli) for all possible orders of selection: 


Units selected 

First draw (i) | Second draw (j) Conditional Probabilities: p(j\i) 

0.1666 
ST 
5599 
1176 
2941 
5882 
ss} 
.2000 
.6666 
.2000 
.3000 
.5000 


PARBRWWWNNN A =A - 
Onmn-ANMHHWH HOWDY 


3. Calculate 7; = P; = p;p(ili) + pj plilj). The two terms in this sum give the prob- 
abilities of drawing the ordered samples (i, j) and (j, i). That is, for example, the 
ordered sample (2, 3) would be drawn as: 

A. Probability of drawing unit 2 on first draw = po = 0.15. 

B. Probability of drawing unit 3 on second draw given that unit 2 was drawn on first 
draw = p(3/2) = 0.2941. 

Thus, the probability of drawing unit 2 on the first draw and then unit 3 on the second 

draw would be: 0.15:0.2941 = 0.044115. Similarly, the probability of drawing the 

ordered sample (3, 2) would be: p3"p(2|3) = 0.25-0.20 = 0.0500. 


The probability of the particular sample that contained the units 2 and 3, without regard 
to order, would then be obtained as the sum of the probabilities of the two possible 
ordered samples: 


T23 = 73.2 = Po2'p(3/2) + ps'p(2|3) 
= 0.044115 + 0.0500 = 0.094115. 


Analogous computations resulted in the figures presented for 7 in the single-stage 
example for the PPS without replacement design (page 15). 


Selection of primary units with probabilities proportional to the sizes of remaining units 
is a method that cannot be easily extended to selection of large samples from large 
sampling universes. In practice, the requirement that probabilities of all possible ordered 
samples be calculated in order to obtain the probabilities for the unordered samples 
rules out application of this method for any but small samples (n = 5) drawn from small 
sampling universes (N < 25). The sheer magnitude of necessary calculations quickly 
exceeds any reasonable expectations for modern computers. For example, the number 
of ordered samples of sizen = 10 selected from a universe of size N = 50 is N!/(N-n)! 
= Shs) 10! 


For the more realistic two-stage applications of the alternative designs contrasted in this 
report, selection of PPS samples was made according to a method developed by Chao 
(1982). This selection method does not require construction of all possible ordered 
samples and it can be readily extended to selection of large sample sizes from large 
sampling universe. The method does not perform well, however, for very small sample 
sizes (usually n < 3 or 4) because some 7, may be Zero. 


By using either selection with probabilities proportional to the sizes of the remaining 
units or Chao’s method, as appropriate, one can effectively use PPS without replace- 
ment designs for small- and moderate-sized sampling universes (N < 100). When N > 
100, current computing time and costs may rule out use of this design. Given the rates 
of advancement in computer technology, however, methods such as Chao’s will 
probably be extendable to much larger sampling universes in the near future. Additional 
technical details concerning the above two PPS without replacement selection methods 
can be found in Hankin (1984, appendix B). 


27 


a 


population estimator is used to generate estimated primary unit totals (Y|) and estimated 
Two-Stage Estimators variances for the estimated totals (V(Y;)), the appropriate estimators are as follows: 
for Alternative Designs 


Appendix 2 When there are two stages of sampling, primary units are of unequal sizes, and some 


Design A.—Two-stage SRS (these are the same as those for the traditional design with 
equal-sized primary units): 


fag = 5%; (1) 
y v\2 

Vike: es a : V(Y;); and (2) 
: 2 

1%.) = “2 as + NS Wo. (3) 


n n 
Wet Mo XY,/=M;; (4) 
N 
S MAY,-¥)2 oN 
- N (N- uel N * 
MSE(Y,at) ~ Set a = > V(Y;); and (5) 
n a as 
> MY; - Y)? n 
Drs 6 L iY) Nee 
MSE(Y,.) ~ SSA Sy. 6) 


where: Y,; = Y,/M;; 


_ NN 
Y/ = LY,/=M,; = Y/Mo; 


Y, = Y,/M;; and 


Both equations (5) and (6) are large-sample approximations to mean square error of Voor 
and should be used only for n > 12 (Cochran 1977, p. 162-164). 


28 


Design C.—PPS without replacement (see Raj 1968, p. 118-119): 


n 
Voss = >\Gi ate (7) 
’ N-1N Nos 
MWers) ae > > (ar77,-77 4) (Y/Y |/77))* at XV(Y,)/77;; and (8) 
i j>i 
ae n-1in , nee 
W(Yoos) = == (rime) (Y/, - Ym)? + SW(Y))/75 ; (9) 
inj ieee! 


with the restrictions that: 


N N-iN 
i j>i 


Summations in equations (8) and (9) are over all distinct pairs of primary units in the 
sampling universe (equation (8)) or in the sample (equation (9)). 


Equations (1)-(3) and (7)-(9) are formally unbiased when unbiased estimators are used 
at the second stage of sampling. Equations (4)-(6) are biased and approximate, as is 
always the case for ratio estimators. In practice, all formulas will be only approximately 
correct because there are no existing population estimators (used at the second stage 
of sampling) that are unbiased. The formulas can still be used with confidence, however, 
because second-stage error is usually small compared to first-stage error, and BIAS(Y;) 
is usually minor for both removal method and mark-recapture population estimation. 


The particular population estimator used at the second stage of sampling is unspecified 
in the above formulas. Any valid population estimator could be used and formulas would 
be unaffected. The particular population estimator used at the second stage of sampling 
will only affect the form of the equations used to calculate V(Y;) and V(Y;). 


29 


Appendix 3 


Details of the Realistic 
Application of Alternative 
Designs 


30 


Application of the alternative two-stage sampling designs in a realistic setting required 
(1) construction of a realistic sampling universe; (2) adoption of a particular population 
estimator at the second stage of sampling; and (3) determination of sampling design 
performance when applied to that sampling universe. 


The realistic sampling universe was constructed on the basis of sampling data collected 
from Knowles Creek, a tributary of the Siuslaw River in Oregon. Based on electrofishing/ 
removal method estimation, fishery biologists of the Pacific Northwest Research Station 
provided estimated population sizes and pool sizes (areas, in m*) for pools ranging in 
size from about 4 m? to 600 m®. These data were used to construct a large sampling 
universe (N = 50) by assuming that estimated numbers in pools were exactly equal to 
the numbers actually present in sampled pools. The correlation between fish numbers 
and pool sizes (r = 0.76) is probably a quite reasonable figure to expect. In some cases 
the correlation may be larger, and in other cases it may be less. 


A two-pass electrofishing/removal method estimator was assumed used to estimate 
population size (Y;) within sampled sections. Letting C,; = number of fish caught on the 
first pass; C; = number of fish caught on the second pass; and q = probability of 
capture, then (Seber 1982, sec. 7.2): 


Vi CNCrs@sc (10) 
V(¥;) ~ Y;(1-q)?(2-q)q° ; and (11) 
V(¥;) = C,C.(C, + C2)(C, - C,)*. (12) 


Probability of capture was set equal to 0.50, a fairly low figure, so as not to minimize the 
magnitude of second-stage error. 


Actual application of the three alternative designs involved: 

1. Calculation of sampling variance (or mean square error) for each design as a function 
of sample size using formulas (2) and (11), (5) and (11), and (8) and (11). 

2. Calculation of relative costs for each design (compared to the two-stage SRS design) 
as a function of sample size using the cost function presented in this report. 

3. Calculation of net relative efficiencies for each design (compared to the two-stage 
SRS design) as a function of sample size. 


All necessary calculations were performed on a large time-sharing computer (CYBER) 
using programs written by the author in APL. All designs have also been implemented, 
however, on a less powerful machine (COMPAQ DESKPRO, an IBM-compatible 
microcomputer). +/ With the exception of drawing large samples for the PPS without 
replacement design, the microcomputer would be entirely adequate for all calculations. 


+’ Use of a trade name does not imply endorsement or approval of 
any product by the USDA Forest Service to the exclusion of others 
that may be suitable. : 


Appendix 4 


Estimation of Total 
Biomass 


There are many possible alternative methods for estimating the total biomass of fish in 
asmall stream. The approach taken in this appendix is the simplest, but not necessarily 
the most precise. It will be assumed that within the i" selected primary unit a simple 
random sample of n; fish is selected and that each of these fish is individually weighed. 
Individual weight measurements are needed to estimate the variance of the estimated 
mean fish weight within a selected primary unit. 


Methods for estimation of total biomass will depend on the sampling design used, but 
all methods require estimation of mean fish weight within each selected primary unit. 
Mean fish weight and variance of mean fish weight within a selected unit can be 
estimated as: 


Vw) = —1+___—__, 


nj (n;-1) 
where: 
Ww, = estimator for mean fish weight in primary unit i; 
w;, = weight of the j"" fish in the i” unit, j = 1, 2,...,1n, 
f; = nj/Y,= estimated fraction of fish measured in primary unit i; and 
Y, = estimated total number of fish in primary unit i. 


The above estimator for the variance of the estimated mean weight is approximate 
because the total number of fish in a primary unit, from which the n; have been selected, 
is unknown and must be estimated using some population estimation method. 


Case 1.—Primary units selected by simple random sampling: designs A (two-stage 
SRS) and B (ratio estimation). For these two designs, total biomass can be estimated as: 
B= Yw; . 


estimated total biomass of fish in entire stream; 
= estimated total number of fish in entire stream; and 
= estimated mean weight of fish in entire stream. 


where: 


= <> ® 
| 


lf Y and wW are nearly statistically independent, then variance of B can be estimated by 
(Goodman 1960): 


31 


32 


The mean weight of fish in the entire stream, w, is best estimated as: 


n " n : 
w = SYw/SY, ; 


where n = total number of selected primary units. This estimator is a weighted average 
of estimated mean fish weights within selected primary units, where the weighting 
factors are the estimated numbers of fish in selected primary units. 


Because w is based on two-stage sampling, with primary units selected by SRS, 
variance of the overall estimated mean weight can be calculated as (Jessen 1978, 
p. 291-294): 


where a, = Y;/(ZY;/n) . 


This estimator is similar to the two-stage estimators of variance presented in appendix 
2. The first term measures variation among estimated mean weights among selected 
primary units. The second term measures errors of estimation of mean fish weights 
within selected primary units. The coefficients, a;, are adjustments for the number of fish 
estimated to be present in a particular sampled unit as compared to the average number 
of fish estimated to be present in a selected primary unit. 


Case 2.—Units selected with probabilities proportional to size: design C (PPS without 
replacement). For this design total biomass within each selected unit is estimated first, 
and then a two-stage estimator is used to estimate total biomass of fish in the entire 
stream and the variance of this estimated total. Within any selected primary unit, total 
biomass can be estimated by: 


= YW, ; 
where: B, = estimated total biomass in primary unit 1; 
Y, = estimated total number of fish in primary unit i; and 
Ww, = estimated mean weight of fish in primary unit i. 


B; involves the product of independently estimated quantities, so Goodman's (1960) 
result can be used to give: 


Formulas for calculating w, and V(w,) are given on the first page of this appendix 
(appendix 4). Y,; and V(Y;) would be calculated using formulas appropriate for the 
particular population estimation method used at the second stage of sampling. 


Given estimates of total biomass of fish within each selected primary unit, the total 
number of fish in the entire stream can be estimated as: 


aA 


n “A 
Beps = > B/a; 5 


where 7; is the probability that the i” primary unit is in a sample of size n primary units 
selected by PPS without replacement. 


An approximate variance of the estimated total biomass of fish can be calculated 
by substituting estimated biomass within a selected primary unit for estimated total 
number of fish in a primary unit and using equation (9) (appendix 2): 


sees n-1n : oe ie 
WBpps) ~ ES CURL TW) (By, - By)? + SV(B)/. 
ie 


As for equation (9), the summation is over all distinct pairs of primary units that are in 
the sample of n primary units. 


If the total biomass of fish within selected primary units is highly correlated with the sizes 
of those primary units, then the PPS design should prove effective for estimating total 
fish biomass in small streams. The performance of this design for estimating biomass 
has not been examined, however, and the above formulas should be regarded as 
preliminary approximations. 


33 


Hankin, D.G. Sampling designs for estimating the total number of fish in small streams. Res. 
Pap. PNW-360. Portland, OR: U.S. Department of Agriculture, Forest Service, Pacific 
Northwest Research Station. 1986. 33 p. 


A common objective of fisheries research is estimating the total number of fish in small streams. 
The conventional approach involves (1) selecting a small sample of equal-length sections of 
stream, and (2) estimating the total number of fish in each section using removal method or 
mark-recapture estimators. Error of estimation of the total number of fish in a stream arises from 
two sources: (1) extrapolation from the small number of sampled sections to the entire stream, 
and (2) errors of estimation of fish numbers within sampled sections. This report shows that 


errors arising from the first source will usually be far larger than those arising from the second 
source. Total error of estimation can be reduced by making sampled sections equivalent to 
natural habitat units. Entire pools or riffles should be sampled rather than fixed-length sections 
of streams. The relative performances of three alternative sampling designs, which can be used 
when sampled sections are equivalent to natural habitat units, are contrasted in terms of 
accuracy and cost-effectiveness. Accuracy of estimation can be dramatically improved if 
sampling designs account for the usually strong, positive correlation between fish numbers and 
habitat unit sizes. 


Keywords: Sampling designs, population sampling, fish population, fish habitat. 


The Forest Service of the U.S. Department of 
Agriculture is dedicated to the principle of multiple 
use management of the Nation’s forest resources 
for sustained yields of wood, water, forage, wildlife, 
and recreation. Through forestry research, 
cooperation with the States and private forest 
owners, and management of the National Forests 
and National Grasslands, it strives — as directed by 
Congress — to provide increasingly greater service 
to a growing Nation. 


The U.S. Department of Agriculture is an Equal 
Opportunity Employer. Applicants for all Department 
programs will be given equal consideration without 
regard to age, race, color, sex, religion, or national 
origin. 


Pacific Northwest Research Station 
319 SW. Pine St. 

PO. Box 3890 
Portland, Oregon 97208 


U.S. Department of Agriculture 
Pacific Northwest Research Station 
_319 SW. Pine Street 
“P°0. Box 3890 
Portland, Oregon 97208 


Official Business 
Penalty for Private Use, $300 


do NOT detach label 


